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ABSTRACT 

Peptidases, their substrates and inhibitors are of 
great relevance to biology, medicine and biotech- 
nology. The MEROPS database (http://merops 
.sanger.ac.uk) aims to fulfil the need for an 
integrated source of information about these. The 
database has hierarchical classifications in which 
homologous sets of peptidases and protein inhibi- 
tors are grouped into protein species, which are 
grouped into families, which are in turn grouped 
into clans. The database has been expanded to 
include proteolytic enzymes other than peptidases. 
Special identifiers for peptidases from a variety of 
model organisms have been established so that 
orthologues can be detected in other species. A 
table of predicted active-site residue and metal 
ligand positions and the residue ranges of the pep- 
tidase domains in orthologues has been added to 
each peptidase summary. New displays of tertiary 
structures, which can be rotated or have the 
surfaces displayed, have been added to the struc- 
ture pages. New indexes for gene names and pep- 
tidase substrates have been made available. Among 
the enhancements to existing features are the inclu- 
sion of small-molecule inhibitors in the tables of 
peptidase-inhibitor interactions, a table of known 
cleavage sites for each protein substrate, and 
tables showing the substrate-binding preferences 
of peptidases derived from combinatorial peptide 
substrate libraries. 

INTRODUCTION 

The MEROPS database is a manually curated informa- 
tion resource for proteolytic enzymes, their inhibitors and 
substrates. The database can be found at http://merops 
.sanger.ac.uk. 



A proteolytic enzyme breaks down a polypeptide or 
protein by cleaving peptide bonds. Proteolytic enzymes 
are needed for the survival of all living organisms, and 
are of importance to mankind in the fields of medicine, 
nutrition, agriculture and technology (1). 

The MEROPS database provides a classification and 
nomenclature of proteolytic enzymes and their inhibitors 
that is widely used throughout the academic community. 
The classification of proteolytic enzymes is derived from 
the system developed by Rawlings and Barrett (2). When 
it became apparent that paper publications to update the 
classification were no longer adequate, the database was 
developed at the Babraham Institute (3). The database 
moved to the Wellcome Trust Sanger Institute in 2002 

(4) . A classification of the protein inhibitors of peptidases 

(5) was added in 2004 (4) and coverage of the mostly 
synthetic, small-molecule inhibitors (SMIs) was added in 
2008 (6). 

Knowledge of the cleavages within protein, peptide and 
synthetic substrates is important for understanding the 
specificity and physiological roles of proteolytic enzymes, 
so the MEROPS database also includes a collection of 
known cleavage sites in substrates (7). Peptidase specificity 
is shown as a WebLogo display (8) and as a table of pref- 
erences for each substrate-binding pocket (6). 

THE MEROPS CLASSIFICATION SYSTEMS 

Proteolytic enzymes are frequently multi-domain proteins, 
with peptidase activity restricted to a single structural 
domain. Protein inhibitors are also frequently multi- 
domain proteins, often containing multiple, homologous 
inhibitor domains. Throughout the MEROPS database, 
only that portion of the sequence corresponding to a 
single peptidase domain (the 'peptidase unit') or a single 
inhibitor domain (the 'inhibitor unit') is used in sequence 
and structure comparisons. 

The classifications are hierarchical. At the bottom of 
each hierarchy is the peptidase or inhibitor unit. The 
protein to which it belongs that has been most fully 
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Table 1. Counts of protein species, families and clans for proteolytic 
enzymes and protein inhibitors in the MEROPS database 





MEROPS 8.5 


MEROPS 9.5 


Peptidases 


Inhibitors 


Peptidases 


Inhibitors 


Sequences 


140 313 


16337 


192053 


17451 


Protein species 


3243 


589 


4202 


634 


Families 


211 


67 


225 


71 


Clans 


42 


32 


44 


34 



The numbers in Release 9.5 of MEROPS (July 2011) are compared to 
those in Release 8.5 (August 2009). 



characterized biochemically is chosen as a representative 
called a 'holotype'. Sequences considered to represent the 
same protein but from different organisms (i.e. 
orthologues) are grouped as a single protein species ac- 
cording to the criteria set out by Barrett and Rawlings 
(9). A new holotype (and protein species) is identified 
when a protein has been biochemically demonstrated to 
have a different specificity from any other member of the 
same family. For a peptidase, either it cleaves different 
substrates, cleaves the same substrates in different 
places or interacts with a different set of inhibitors; for 
an inhibitor, it interacts with a different set of peptidases 
or binds a peptidase much more tightly. A new identifier is 
also created if the characterized protein has a different 
architecture, or does not cluster on an evolutionary tree 
with other characterized proteins. The numbers of identi- 
fiers set up for peptidases and inhibitors are shown in 
Table 1. 

Homologues [detectable by a sequence similarity search 
using FastA (10), BlastP (11) or HMMER (12)] are 
grouped into a family. A family contains any number of 
homologues. One sequence is chosen as the type example 
of the family, and all sequences in the family are homolo- 
gous to this type example, either directly or transitively. A 
sequence is included in the family if a pairwise alignment 
with an existing member of the family shows a statistically 
significant match, i.e. the expect value is <0.001. 

The highest level of the hierarchy is that of clan, and all 
sequences within a clan are believed to be derived from the 
same ancestor, even if there is no significant sequence simi- 
larity. The most rigorous criterion for including proteins 
in the same clan is a similar tertiary structure. The DALI 
algorithm and server (13) is used to compare structures, 
and if the z-score from the DALI comparison to that of an 
existing member of a clan is >6.0, the sequence is added to 
that clan. The order of active-site residues is conserved in 
all members of a clan, and where no tertiary structure is 
known, a family may be added to a clan if this is the same. 
A clan can consist of a single family if the tertiary struc- 
ture of a member is unrelated to that of any other peptid- 
ase or protein inhibitor. 

Table 1 shows statistics for release 9.5 of the MEROPS 
database. In the 2 years since the previous article (14), 
despite the number of sequences in the database having 
increasing by over a third, only 1 8 new families and 4 new 
clans have been added. 



RECENT DEVELOPMENTS 

The website has been redesigned and improved. Frames 
have been removed from some HTML pages so that a user 
can bookmark any page. In addition, a Request Tracker 
ticketing system has been introduced to allow users to 
make comments and suggestions and to report errors. 
This can be accessed via a 'feedback' link present in the 
footer of every page. 

The database has been extended to include proteolytic 
enzymes other than peptidases. Families of self-cleaving 
proteins that utilize the peculiar chemistry of asparagine 
to break peptide bonds without hydrolysis, known as 'as- 
paragine peptide lyases' (15) have been added to the 
database. 

Indexes are now provided for peptidase substrates (see 
below) and gene names of peptidases and protein inhibi- 
tors. The gene name index also includes synonyms of the 
names and the locus names from completely sequenced 
genomes. The names are listed alphabetically, along with 
the source organism (with a clickable link to the organism 
page in MEROPS) and the protein name recommended by 
the MEROPS team (with a clickable link to the summary 
page in MEROPS). 

The MEROPS database includes over 44 000 literature 
references, and in addition to links to PubMed and to the 
text of papers from journal websites made available via 
DOI (digital object identifier), links are now made to the 
free text articles in PubMed Central (16). We have also 
implemented a new facility to search our literature collec- 
tion for a specific PubMed identifier. This is available via 
the 'Searches' option on the left-hand menu. On entering a 
PubMed identifier the full reference is returned, plus a list 
of peptidases and inhibitors for which this reference is 
cited in MEROPS, with a link to the MEROPS 
summary page for each. 

Cross-references to other databases 

A number of new cross-references have been established 
between items in the MEROPS database and other 
publicly available databases. Over 200 cross-references 
to Wikipedia articles have been set-up for individual pep- 
tidases and inhibitors on the relevant summary pages, with 
reciprocal links within those Wikipedia pages. New 
cross-references have been established between SMIs and 
the ChEBI (17) and DrugBank (18) databases, with 100 
and 40 cross-references, respectively. 

Sequence features 

A new page giving details of species variants of peptidases 
has been created. Whenever a sequence is added to the 
MEROPS collection, several parameters are calculated 
from a BlastP pairwise comparison, including the 
position of active-site residues and the extent of the pep- 
tidase unit. The Sequence Features page presents the 
results as a table (Figure 1). Each row in the table shows 
the following information: the MEROPS sequence identi- 
fier, the scientific name of the source organism, the 
sequence length, the extent of the peptidase (or inhibitor) 
unit relative to the complete coding sequence, the 
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MERNUM 



Species 



Sequence length Peptidase unit Active site residues Metal ligands 



Source 



MER199117 


Ailuropoda melanoleuca 


679 


17-679 


E469, 


H468, H472, E497, 


GENBANK:EFB1 3539 


MER107691 


Brachvdanio rerio 


686 


19-683 


E471 


H470, H474, E499 


GENBANK:AAH75901 


MER101043 


Ciona intestinalis 


691 


10-690 


E459 


H458, H462, missing 


ENSEMBLENSCINP00000017102 


MER1 57852 


Dipodomvs ordii 


686 


78-685 


E474 


H473, H477, E502 


ENSEMBLENSDORP00000008420 


MER098455 


Eauus caballus 


687 


2-686 


E474 


H473, H477, E502 


GENBANK:XP 001493585 


MER1 00935 


Gasterosteus aculeatus 


685 


5-675 


E470 


H469, H473, E498 


ENSEMBLENSGACP00000004264 


MER001737 


Homo sapiens 


689 


2-687 


E474 


H473, H477, E502 


UNIPROT:P52888 


MER1 12464 


Macaca mulatta 


648 


2-648 


G478 


C477, E481, E506 


GENBANK:XP 001117760 


MER014510 


Mus musculus 


687 


2-687 


E474 


H473, H477, E502 


UNIPROT:Q9EPX1 


MER1 00960 


Mvotis lucifuaus 


686 


2-680 


E473 


H472, H476, E501 


ENSEMBLENSMLUP00000013017 


MER101030 


Orvzias latipes 


687 


21-684 


E472 


H471, H475, E500 


ENSEMBLENSORLP00000002985 


MER001149 


Rattus norveaicus 


687 


2-687 


E474 


H473, H477, E502 


UNIPROT:P24155 


MER164477 


Salmo salar 


685 


19-682 


E470 


H469, H473, E498 


GENBANK:NP 001133368 


MER103219 


Schistosoma iaponicum 


334 


3-326 


E127 


H126, H130, E155 


GENBANK:AAX26445 


MER101024 


Sorex araneus 


665 


1-665 


E463 


H462, H466, E491 


ENSEMBLENSSARP00000009994 


MER1 13852 


Stronavlocentrotus purpuratus 


307 


37-307 


E141 


H140, H144, E169 


GENBANK:XP 790202 


MER001150 


Sus scrota 


687 


2-687 


E474 


H473, H477, E502 


UNIPROT:P47788 


MER177631 


Taeniopvoia outtata 


717 


39-709 


E504 


H503, H507, E532 


GENBANK:XP 002194663 


MER1 07706 


Tetraodon niaroviridis 


701 


1-701 


E458 


H457, H461, Q488 


GENBANK:CAF91485 


MER012094 


Xenopus laevis 


685 


1-684 


E471 


H470, H474, E499 


UNIPROT:Q9PTV2 


MER079236 


Xenopus tropicalis 


684 


7-683 


E470 


H469, H473, E498 


UNIPROT:UPI00006A14DA 



Figure 1. Sequence features display. The sequence features are shown for orthologues of thimet oligopeptidase. 



predicted active-site residues (and metal ligands for a 
metallopeptidase) and the source of the sequence 
included in the MEROPS collection with a link to the 
relevant database. The organism scientific name is click- 
able and takes the user to the relevant organism page in 
MEROPS. For the active-site residues and metal ligands, 
each amino acid is shown in single letter code next to the 
residue number derived from the source sequence. If the 
sequence is a fragment or from a eukaryotic genome 
sequencing project where the automated gene build has 
missed an exon, then absent active-site residues are 
labeled 'missing'. The items are arranged alphabetically 
by species scientific name, but can be re-sorted by the 
MEROPS sequence identifier. 

Tertiary structure displays 

When the tertiary structure of a peptidase or a protein 
inhibitor has been solved, and the co-ordinates are avail- 
able from the Protein Data Bank (PDB) (19), a structure 
page is presented at the MEROPS website. Besides a table 
of PDB entries, this has also included a fixed Richardson 



image (20) showing the structure with helices shown as red 
coils and strands as green arrows, with the active-site 
residues (and metal ligands for metallopeptidases) in 
ball-and-stick representation. Metal ions, attached carbo- 
hydrates, and inhibitors are also displayed where appro- 
priate. However, a rotating image provides more insight 
into a protein structure, and so we now present a rotating 
image using As tex Viewer (21) alongside the fixed image. 
The same structural elements, residues, metals and carbo- 
hydrates are shown in both images, because the command 
line input for the Astex Viewer is derived from the input 
file used for the Richardson image. There is an option to 
show the surface of the molecule, and the image can be 
rotated in any direction by clicking on the image and 
holding down the left mouse button. Various other 
options are available by clicking on the right mouse 
button, including changing colors, saving the image and 
measuring the distances between atoms. To be able to use 
the AstexViewer, users must have Java installed. An 
example of the images on a structure page is shown in 
Figure 2. 
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RA 

PDB Organism Resolution Comment 

1NPC | Bacillus cereus | 2.00 A | mature 



The catalytic zinc is shown as a light grey CPK sphere. Structural calcium ions are shown as yellow CPK spheres. The zinc ligands are shown in ball-and-stick representation: His392 and His396 in purple and 

Glu416 in blue. The catalytic Glu393 is shown in blue. 




Figure 2. Displays of tertiary structure. The Structure page for thermolysin (M04.001) is shown. The table provides the cross-reference to the PDB 
entry, source organism, resolution, a comment and a description of the elements displayed in the images below. The image on the left-hand side is a 
rendered Richardson image generated using the programs RasMol (22), Molscript (23) and Render (24). The image on the right shows the surface of 
the molecule using the AstexViewer. This image can be rotated in any direction by the user, and the surface hidden by clicking the 'hide surface' 
button. The third image shows secondary structure, active-site residues and metal ligands as they appear in the protein sequence. 
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Figure 3. Growth in number of determined putative peptidase sequences. The curves shown are (A) all sequences, (B) sequences assigned to 
identifiers, (C) sequences assigned to identifiers excluding model organisms, (D) all identifiers and (E) identifiers excluding model organisms. 
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Peptidases from model organisms 

Homologues of peptidases and protein inhibitors are 
being sequenced much faster than they can be 
characterized. Consequences of this can be seen in 
Figure 3, which shows the cumulative totals of homo- 
logues of peptidase sequences in the MEROPS database 
since 1998, and the total number of MEROPS peptidase 
identifiers per year. It also shows the number of homo- 
logues that have been assigned to identifiers. Although 
MEROPS identifiers can be applied to species variants, 



Table 2. Counts of peptidase-encoding genes in selected model 
organisms 



Organism 


Sequences 


Sequences 


Total 




assigned 


assigned 






to standard 


to special 






MEROPS 


MEROPS 






identifiers 


identifiers 




Homo sapiens 


557 


1 


558 


Mus musculus 


553 


14 


567 


Drosophilia melanogaster 


111 


320 


431 


Caenorhabditis elegans 


74 


273 


347 


Arabidopsis thaliana 


125 


452 


577 


Saccharomyces cerevisiae 


76 


25 


101 


Escherichia coli 


88 


167 


255 



the number of sequences that are unassigned is increasing 
rapidly. Less than half of all putative peptidases can be 
classified at the peptidase level, because the sequences are 
too divergent from that of the holotypes or the protein 
architecture is significantly different. This has led us to 
search for methods to draw attention to enzymes that 
may be suited to biochemical study because they come 
from well-characterized model organisms. An approach 
that we have adopted is to extend the concept of the 
holotype to these uncharacterized proteins. Special 
MEROPS identifiers have been created for all the 
uncharacterized peptidase homologues from a variety 
of model organisms: human, mouse, Drosophila 
melanogaster, Caenorhabditis elegans, Arabidopsis 
thaliana, Saccharomyces cerevisiae and Escherichia coli. 
These special identifiers resemble the standard 
MEROPS identifier except that the first character after 
the dot is the letter 'A' or 'B' so that it is easy to distin- 
guish an identifier for a characterized peptidase from that 
of an uncharacterized one. Such special identifiers have 
not been set up for protein inhibitors or non-peptidase 
homologues. Table 2 shows the number of peptidases 
and putative peptidases in each model organism. In 
total, 1248 special identifiers have been created. Creation 
of special identifiers is useful if orthologues from other 
species can be identified, because a widely distributed 



Searches of the MEROPS database 

Display Known Cleavages for a Protein 
Please enter a UniProt accession (eg P05067): 

Accession: ] Q6YBV4 | 
SubmitQuery ] 



DSPP600, 


Sequence Q6YBV4 

Sus scrofa 


















1 


M- K-I-I -I-Y-F-C-I -W-A- 1 -A-W-A- I - E -V- P-g-I-K-P-I -E -R- B-A- 1 - D-K- S -V-N-L-N- L - L -A-K- S -K-A- E -V- Q-D-E - L-N-A-N- 




-7 


-T- 


-K- 




-5-G 


-I- 


60 


61 


P-M-E-D-H-D-I^-RH3^-D-T-K-D^-Y-KH5-E-R-H^-S-E-W-A-D-V-G-G-N-S-S-S-A-R-P-M-L-A-H-K-E-E-N-T-E-D-P-N-G- 






-G- 




-p- 


-E-E 


-Y- 


120 


121 


S- H-D-G-L-H- G-R- G-D-S-S-E -A-H-G-L-R- 3- Q-V- S - 1 - L - D-N- T-6-T -A-N- G-S-B -7-N- G-V- T - D-K-N- S -K-N-E - D-V- G-N-A-S- 




-S 


-E- 






-1-7 


-V- 


ISO 


181 


P-E-D-R-Y-Q-V-A-G-S-N-N-S-I-G-fl-E-D-E-I-N-G-N-F-C-R-N-G-G-D-V-S-E-T-T-P-P-G-E-G-E-I-N-G-N-E-E-T-G-V-T-S- 






-S- 






-G-N 


-R- 


240 


241 


E-F-A-^-L-D-N-S-D-G-S-P-S-G-N-G-A-D-E-E-E-D-K-G-S-G-D-D-E-G-E-E-T-G-N-G-E-R-T-A-D-T-S-K-G-Q-E-N-P-S-B-G- 


-E- 


-E- 


-E- 




-:- 


-E-E 




300 


301 


D-D-E-S-L-G-Oj-N-S-I-S-S-E-D-E-G-P-G-B-K-E - A- A- H -A- I - I- 3- I-N- I - S -K-S-E-E-D-S-D-N-I-P-G-Et S-R-S-Q-R-I -E-D- 


-T- 




-K- 


-p- 


-H- 


-Q-R 


-E- 


360 


361 


r-K-A-V-A-H-G-V-T-A-I -S-E-P-L-At I -G-K-S-^-D-K-G-I -E-I -A-A-P-Rt S-G-N-R-S-N-I -T-K-E-A-G-K-V-S-E-D-R-E-S-K- 






-B- 




-V. 


-I-V 




420 


421 


K^-S-V-K-T^^-E-A-D-I-M^-R-P^-P-K-S-E-P-G-N-K-P-G-P-S-K-T-E-S-D-S-N-S-E-G-Y-D-S-Y-E-F-D-G-K-S-M-Q-G 






-E- 


-N- 


-S- 


-S-E 


-E- 


480 


431 


S-N-G-S-D-D-A-N-S-E-G-D-N-N-H-S-S-R-G-D-T-S-Y-N-S-D-E-S-D-D-N-G-N-D-S-D-S-K-E-E-A-E-E-D-N-T-S-D-A-N-D-S- 




-S- 






-II- 




-N- 


540 


541 


G-S-D-D-S-G-K-5-G-5-S-K-A-E-S-E-S-SH3-S-S-E-S-S-E-S-D-C-S-I-W-R-F-P-G-^-R-G-R-I-R-A-I-A-A-S-L-C-E-5-H-fl- 


-11- 


-I 


-R- 


-s- 


-E- 


-L-R 


-L 


600 


Click here to display' alianment and conservation of cleavaae sites of this seauence with close homoloaues. This will take a few moments. 



















Peptide and protein substrates that are thought to be physiologically relevant are indicated byBLgeptide and protein substratesthat are not physiologicallvre levant are indicated 
bvBl. How cleavage sites have been identified are indicated by the following evidence codes TWi = N-terminal sequencing, = mass spectroscopy, PHI = mutation. CE1 = 
consensus sequence. To see all annotated cleavages for a peptidase, click on the peptidase name. 



Cleavage Site Peptidase Residue range Cleavage type Description Evidence Reference 



344 


matrix metallopeptidase-2 


1-600 


H 






Yamakoshi era/.. 2006 


376 


matrix metallopeotidase-2 


1-600 


□ 






Yamakoshi era/.. 2006 


391 


matrix metallopeptidase-20 


1-600 


H 






Yamakoshi era/.. 2006 


472 


meprin alpha subunit 


1-600 


□ 




bq 


Tsuchiva etal.. 2010 


472 


meorin beta subunit 


1-600 


B 




1771 


Tsuchiva etai.. 2010 


472 


procollagen C-peptidase 


1-600 


B 




1771 


Tsuchiva etal.. 2010 



Figure 4. Display of cleavages in a protein substrate. Known cleavages in the DSPP600 protein from pig are shown. The full sequence is shown at 
the top with cleaved bonds indicated by the 'dagger' symbol. More details of each cleavage are shown in the table beneath. 
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Organism comment 



fluorophore 
optimal or 
substrate acceptor- 
donor pair 



Reference 



CdriCd 

papaya 


wild type 


Carica 
papaya 


wild type 


Carica 
papaya 


wild type 


Carica 
papaya 


wild type 


Carica 
papaya 


wild type 


Carica 
papaya 


wild type 


Carica 
papaya 


wild type 





V R/Q/A | A/S/T | S/R I G/K/P/R | P/Q/N 




P/G/H/broad I P/R I V R/K 




PPVR+ASGP 


Abz- 
Tyr(N02) 


St Hilaire 
et al., 
1999 


PPVR 


ACC 


Choe et 
al., 2006 


xxx K "I - xxxx 


ACC-Dabcyl 


Sun et al., 
2007 


xxxlaf "l"xxxx 


Abz-EDDnp 


Alves et 
al., 2001 


xxxx "1" Lxxx 


Dansyl-Trp 


Menard et 
al., 1993 


xxxx "1" Lxxx 


Abz-EDDnp 


Melo et 
al., 2001 


xxxG "l"xxxx 


Abz-EDDnp 


Del Nerv 
et al., 
2000 



Figure 5. Specificity from combinatorial peptide libraries. The amino acid preferences within each substrate-binding pocket (labeled P4-P4') are 
shown for experiments using combinatorial libraries of peptide substrates for the peptidase papain. 



putative protein is much more likely to become 
characterized. The special identifiers C26.A17 (GuaA 
protein, E. coli), C26.A19 (PabA protein, E. coli) and 
M20.A11 (AbgB protein, E. coli) have each been applied 
to putative proteins from over 500 species. Figure 3 also 
shows how these special identifiers have helped us to 
cluster putative peptidase sequences as species variants. 
When a peptidase assigned to a special identifier is bio- 
chemically characterized, a standard type of MEROPS 
identifier will be assigned to replace the special one. 



Small-molecule peptidase inhibitors 

A new series of identifiers has been created for SMIs. 
SMIs include naturally occurring compounds such as 
pepstatin, bestatin and amastatin, as well as synthetic in- 
hibitors generated in a laboratory, and so do not lend 
themselves to any form of natural classification, unlike 
the peptidases and protein inhibitors. Instead, each SMI 
is assigned an identifier consisting of an initial J followed 
by a five digit number. For example, pepstatin is J00095 
and ethylenediaminetetraacetic acid is J00149. This allows 
users to connect directly to an SMI summary. 

The page of inhibitor interactions, available from most 
peptidase summaries, now includes small molecule as well 
as protein inhibitors. These interactions have been col- 
lected from the literature. For each SMI, a link has been 
provided to the relevant summary. 



Peptidase substrates 

Our collection of known cleavage sites in substrates 
consists of 54 837 cleavages in release 9.5, of which 
48 557 (88.5%) were mapped to identifiers in the 
UniProt database (representing cleavages in 14446 differ- 
ent proteins). The remaining 6281 (11.5%) represent 
cleavages in synthetic substrates. This is an increase of 
15 191 cleavages (27.7%) since release 8.5 (August 2009). 
Substrates have been tagged as physiological, non- 
physiological, pathologic and synthetic, as judged by the 
original authors, unless there is evidence to indicate other- 
wise. It is now possible to filter the substrates listed for 
each peptidase so that only physiologically relevant, 
pathologic, non-physiological or synthetic substrates are 
shown. 

An index of substrate names is now available. This 
lists the name, the UniProt accession, the peptidase 
known to cleave that substrate with a link to the 
summary for that peptidase, and a count of cleavages per- 
formed by each peptidase. On clicking the UniProt iden- 
tifier, the user is presented with a display showing the 
cleavages within the sequence and a table of cleavages. 
The table shows: 

(i) the residue number of the amino acid in the PI 
position (i.e. on the left of the scissile bond); 

(ii) the name of the peptidase responsible (with a link 
to the relevant peptidase summary); 
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(iii) the residue range of the substrate used in the ex- 
periment compared to the complete coding 
sequence that is presented in the UniProt entry 
(e.g. minus a signal peptide or propeptide for a 
mature protein); 

(iv) whether the cleavage is thought to be physiological 
or not; 

(v) how the cleavage was determined (using the fol- 
lowing symbols: NT for N-terminal sequencing, 
MS for mass spectroscopy, MU for site-directed 
mutagenesis and CS for theoretical cleavages 
that fit the consensus sequence of a peptidase 
substrate); 

(vi) a comment describing the purpose of the cleavage 
(e.g. 'release of a signal peptide'); and 

(vii) a reference. 

An example of cleavages annotated in a protein substrate 
is shown in Figure 4. 

The identification of cleavage sites in substrates is 
important not only for determining the physiological 
roles of peptidases, but also for determining the specificity 
of the peptidase, which can help in the design of better and 
more selective synthetic substrates and inhibitors. There 
are now high-throughput techniques for determining pep- 
tidase specificity which automatically calculate preferences 
for each substrate-binding pocket, but do not determine 
the cleavage position in each synthetic peptide. An array 
of different peptides is made that is known as a 'combina- 
torial library of substrates'. Because the cleavage position 
and sequence of each substrate is not known, these cannot 
be entered into the MEROPS collection of substrate cleav- 
ages. However, Poreba and Drag (2010) (25) have 
assembled a collection of peptidase preferences from the 
available literature, and made them available to 
MEROPS. These are presented as a table on each 
relevant peptidase summary. The table lists the source 
organism of the peptidase, a comment (such as whether 
the peptidase was wild-type or recombinant), the specifi- 
city in terms of substrate-binding pockets P4 to P4' with 
the preferred amino acids shown in single letter code, the 
optimal substrate derived from the study, the fluorophore 
attached to the substrate or the acceptor-donor pair of a 
quenched fluorescent substrate, and a reference. Figure 5 
shows an example of the new combinatorial peptides 
display. 
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